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Abstract — We consider large-scale wireless sensor networks 
with n nodes, out of which k are in possession, (e.g., have sensed 
or collected in some other way) k information packets. In the 
0^ ' scenarios in which network nodes are vulnerable because of, for 
example, limited energy or a hostile environment, it is desirable 
to disseminate the acquired information throughout the network 
fS| so that each of the n nodes stores one (possibly coded) packet 
I . so that the original k source packets can be recovered, locally 
O and in a computationally simple way from any fc(l + e) nodes for 
some small e > 0. We develop decentralized Fountain codes based 
algorithms to solve this problem. Unlike all previously developed 
schemes, our algorithms are truly distributed, that is, nodes do 
OA not know n, k or connectivity in the network, except in their own 
neighborhoods, and they do not maintain any routing tables. 




Fig. 1. A sensor network has 25 sensors (big dots) monitoring an area and 
225 storage nodes (small dots). A good distributed storage algorithm should 
enable us to recover the original 25 source packets from any 25+ nodes (e.g., 
the set of nodes within any one of the three illustrated circular regions). 
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I. Introduction 

Wireless sensor networks consist of small devices (sensors) 
with limited resources (e.g., low CPU power, small bandwidth, 
limited battery and memory). They are mainly used to monitor 
and detect objects, fires, temperatures, floods, and other phe- 
nomena [1], often in challenging environments where human 
involvement is limited. Consequently, data acquired by sensors 
may have short lifetime, and any processing of such data 
within the network should have low complexity and power 
consumption [1]. 

Consider a wireless sensor network with n sensors, where k 
sensors collect(sense) independent information. Because of the 
network vulnerability and/or inaccessibility, it is desirable to 
disseminate the acquired information throughout the network 
so that each of the n nodes stores one (possibly coded) 
packet and the original k source packets can be recovered 
in a computationally simple way from any fc(l + e) of nodes 
for some small e > 0. Two such scenarios are of particular 
practical interest: to have the information acquired by the 
k sensors recoverable (1) locally from any neighborhood 
containing fc(l+e) nodes or (2) from the last fc(l+e) surviving 
nodes. Fig. [T] illustrates such an example. 

Many algorithms have been proposed to solve related dis- 
tributed storage problems using coding with either centralized 
or mostly decentralized control. Reed-Solomon based schemes 
have been proposed in [2]-[5] and Low-Density Parity Check 
codes based schemes in [6]-[8], and references therein. 
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Fountain codes have also been considered because they 
are rateless and because of their coding efficiency and low 
complexity. In [9] Dimakis el al. proposed a decentralized 
implementation of Fountain codes using fast random walks to 
disseminate source data to the storage nodes and geographic 
routing over a grid, which requires every node to know its 
location. In [10], Lin et al. proposed a solution employing 
random walks with stops, and used the Metropolis algorithm 
to specify transition probabilities of the random walks. 

In another line of work, Kamra et al. in [11] proposed 
a novel technique called growth coding to increase data 
persistence in wireless sensor networks, that is, the amount 
of information that can be recovered at any storage node at 
any time period whenever there is a failure in some other 
nodes. In [12], Lin et al. described how to differentiate 
data persistence using random linear codes. Network coding 
has also been considered for distributed storage in various 
networks scenarios [13]-[17]. 

All previous work assumes some access to global informa- 
tion, for example, the total numbers of nodes and sources, 
which, for large-scale wireless sensor networks, may not be 
easily obtained or updated by each individual sensor. By 
contrast, the algorithms proposed in this paper require no 
global information. For example, in [10], the knowledge of 
the total number of sensors n and the number of sources k is 
required to calculate the number of random walks that each 
source has to initiate, and the probability of trapping data at 
each sensor. The knowledge of the maximum node degree (i.e., 
the maximum number of node neighbors) of the graph is also 
required to perform the Metropolis algorithm. Furthermore, the 
algorithms proposed in [10] request each sensor to perform 
encoding only after receiving enough source packets. This 
demands each sensor to maintain a large temporary memory 
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buffer, which may not be practical in real sensor networks. 

In this paper, we propose two new algorithms to solve 
the distributed storage problem for large-scale wireless sen- 
sor networks: LT-Codes based distributed storage (LTCDS) 
algorithm and Raptor Codes based distributed storage (RCDS) 
algorithm. Both algorithms employ simple random walks. 
Unlike all previously developed schemes, both LTCDS and 
RCDS algorithms are truly distributed. That is, except for 
their own neighborhoods, sensors do not need to know any 
global information, e.g., the total number of sensors n, the 
number of sources k, or routing tables. Moreover, in both 
algorithms, instead of waiting until all the necessary source 
packets have been collected to perform encoding, each sensor 
makes decisions and performs encoding upon each reception 
of a source packet. This mechanism significantly reduces the 
node's storage requirements. 

The remainder of this paper is organized as follows. In 
Sec. ini we introduce the network and coding model. In 
Sec. [nil we present the LTCDS algorithm and provide its 
performance analysis. In Sec. |IV| we present the RCDS 
algorithm. In Sec. [V] we present simulation results for various 
performance measures of the proposed algorithms 

II. Network and Coding Models 

We model a wireless sensor network consisting of n nodes 
as a random geometric graph [18], [19], as follows: The nodes 
are distributed uniformly at random on the plane and all have 
communication radii of 1 . Thus, two nodes are neighbors and 
can communicate iff their distance is at most 1. Among the n 
nodes, there are k source nodes (uniformly and independently 
picked from the n) that have independent information to be 
disseminated throughout the network for storage. A similar 
model was considered in [10]. Our algorithms and results 
apply for many network topologies, e.g., regular grids of [3]. 

We assume that no node has knowledge about the locations 
of other nodes and no routing table is maintained; thus the 
algorithm proposed in [3] cannot be applied. Moreover, we 
assume that no node has any global information, e.g., the 
total number of nodes n, the total number of sources fc, or 
the maximal number of neighbors in the network. Hence, the 
algorithms proposed in [10] cannot be applied. We assume 
that each node knows its neighbors. Let J\f{u) denote the set 
of neighbors of u. We will refer to the number of neighbors 
of u as the node degree of u, and denote it by /^(m) = |7V(u)|. 
The mean degree of a graph G is then given by 



(1) 



For k source blocks {a;i, . . . , x^,} and a probability dis- 
tribution Vl over the set {!,..., A:}, a Fountain code with 
parameters (A:, Vt) is a potentially limitless stream of output 
blocks {yi; 2/2, • • • } [20], [21]. Each output block is generated 
by XORing d randomly and independently chosen source 
blocks, where d is drawn from VL{d). 

LT (Luby Transform) codes [20], [21] are Fountain codes 
that employ either the Ideal Soliton distribution 

1/fc, d = 1, 



or the Robust Soliton distribution, which is defined as follows: 
Let R = CQhi{k / 5)\/k, where co is a suitable constant and 
< (5 < 1. Define 

( R/dk, d= l,...,k/R- 1, 

T{d) = l Rlii{R/S)/k, d = k/R, (3) 
[ 0, d = k/R+l,...,k. 

The Robust Soliton distribution is given by 

r(d) + r!j(d) 



nnid) 



1,2,, 



(4) 



Raptor codes are concatenated codes whose inner codes are 
LT and outer codes are traditional erasure correcting codes. 
They have linear encoding and decoding complexity [21]. 

If each node in the network ends up storing an LT or Raptor 
code output block corresponding to the k source blocks, then 
the the k source blocks can be recovered in a computationally 
simple way from any fc(l + e) of nodes for some small e > 0, 
[20], [21]. For different goals, different distributions 51 may be 
of interest. Our storage algorithm can take any 57 as its input. 

III. LT Codes Based Algorithms 
A. Algorithm Design 

The goal of our storage algorithm is to have each of the 
n nodes store an LT code output block corresponding to 
the k input (source) blocks without involvement of a central 
authority. To achieve this goal, a node in a network would 
have to store, with probability il{d), a binary sum (XOR) 
of d randomly and independently chosen source packets. Our 
main idea to approach this goal in a decentralized way is to 
(1) disseminate the k source packets throughout the network 
by k simple random walks and (2) XOR a packet "walking" 
through a node with a probability d/k where d is chosen at 
the node randomly according to 51. 

To ensure that each of the k random walks at least once 
visits each network node, we will let the random walks last 
longer than the network (graph) cover time [22], [23]. 

Definition 1: (Cover Time) Given a graph G, let Tcoveriu) 
be the expected length of a simple random walk that starts at 
node u and visits every node in G at least once. The cover 
time of G is defined by Tcover{G) = maxugc Tcoveriu). 

Lemma 2 (Avin and Ercal [24]): Given a random geomet- 
ric graph G with n nodes, if it is a connected graph with high 
probability, then 



..(G) = e(nlogn). 



(5) 



riiid) = 



l/[d{d-l)], d = 2,3,. 



■ 1 



(2) 



In addition, the probability that a random walk on G will 
require more time than Tcover{G) to visit every node of G 
is O(l/nlogn) [22]. Therefore, we can virtually ensure that 
a random walk visits each network node by requiring that it 
makes Cinlogn steps for some Gi > 0. To implement this 
requirement for the k random walks, we set a counter for each 
source packet and increment it after each transmission. Each 
time a node receives a packet whose counter is smaller than 
Gin log n, it accepts the packet for storage with probability 
d/k (where d is chosen at the node according to 51), and then, 
regardless of the acceptance decision, it forwards the packet 



IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, TO APPEAR IN 2010 



3 



to one of its randomly chosen neighbors. Packets older than 
Cinlogn are discarded. 

Note that the above procedure requires the knowledge of n 
and k at each node. To devise a fully decentralized storage 
algorithm, we note that each node can observe (1) how often 
it receives a packets and (2) how often it receives a packets 
from each source. Naturally, one expects that these numbers 
depend on the network connectivity {p.{u) for all u), the size 
of the graph n, and the number of different random walks k. 
We next describe this dependence and show how it can be 
used to obtain local estimates of global parameters. 

The following definitions and claims either come from [22], 
[23], [25], or can be easily derived based on the results therein. 

Definition 3: (Inter-Visit Time) For a random walk on a 
graph, the inter-visit time of node u, Tmsit{u), is the amount 
of time between any two consecutive visits of the walk to u. 

Lemma 4: For a node ii with node degree in a random 
geometric graph, the mean inter-visit time is 



E[T^isit(u)] = fin/fi{u), 



(6) 



where JI is the mean degree of the graph given by ([T]). 

Lemma |4] implies n = fi{u)E[Tyisit{u)]/]I. While node u 
can easily measure E[Tmsit{u)], the mean degree /J is a piece 
of global information and may be hard to obtain. Thus we 
make a further approximation and let the estimate of n by 
node u be 

n{u) E[Tyistt{u)]. (7) 

Note that to estimate n, it is enough to consider only one 
of the k random walks. Now to estimate fc, we also need to 
consider the k walks jointly without distinguishing between 
packets originating from different sources. 

Definition 5: (Inter-Packet Time) For multiple random 
walks on a graph, the inter-packet time of node u, Tpac.ket{u), 
is the amount of time between any two consecutive visits by 
any of the walks to u. 

Lemma 6: For a node u with node degree in a random 
geometric graph with k simple random walks, the mean inter- 
packet time is 



E[Tpacket{u)] 



E[Tyisit{u)] fin 



(8) 



k kiJ.{u) 

where Jl is the mean degree of the graph given by ([T]). 
Proof: For a given node u, each of the k random walks has 
expected inter-visit time ^f^- We now view this process from 
another perspective: we assume there are k nodes {vi, . . . ,Vk} 
uniformly distributed in the network and an agent from node 
u following a simple random walk. Then the expected inter- 
visit time for this agent to visit any particular Vi is the same 
as ^j^- However, the expected inter- visit time for any two 
nodes and v 



1 -i^, which gives ^. 



□ 



3 k ^{u) 

Based on Lemmas |4] and |6l that is equations ^ and ([8]), we 
see that each node u, can estimate k as 

k{u) ^ E[Tyisit{u)]/ E[Tpacket{u)\. (9) 

We are now ready to state the entire storage algorithm: 



Definition 7: (LTCDS Algorithm) 
with system parameters Ci , C2 > and 

Initialization Pliase 

Each source node s, s = 1, . . . , fc 

1) attaches a header to its data Xg, containing its ID and a 
life-counter c{xs) set to zero, and then 

2) sends its packet to a randomly selected neighbor. 
Each node u sets its storage t/„ = 0. 

Inference Pliase (at all nodes u) 

1) Suppose Xs{u)i is the first source packet that visits u, 
and denote by the time when Xg(^^-^^ makes its j-th 
visit to u. Concurrently, u maintains a record of visiting 
times for all packets a;s(„). "walking" through it. Let 
be the time when source packet Xg(^^). makes its 



t 



U) 



j-th visit to u. After Xs(^u)i visits u C2 times, where 
C2 > is system parameter, u stops this monitoring 
and recoding procedure. Denote by k{u) the number of 
source packets that have visited at least once until that 
time. 

2) Let J{s{u)i) be the number of visits of source packet 



s(u). 



to u and let 



1 



Mi 



JGs(u).) - 
1 



_(AJ(a{u), 
1 V^M, 



»-^i)J- (10) 



J{s{u)i) 

Then, the average inter-visit time for node u is 



Let Jmin = min{i^!\ } and J, 

s{u)i ^ 

Then the inter-packet time is 



1 



k{u) ^ 



k(u) 



(11) 



max — "/'\M^s(u), J ■ 



'Epacket (^) 



and u can estimate n and k as 

Eyisit ('^) 



h{u) ^ Eyisit{u) and k{u) 



T 



(12) 



(13) 



packet (^) 

3) In this phase, the counter c{xs^ ) of each source packet 
c{xs. ) is incremented by one after each transmission. 
Encoding and Storage Pliase (at all nodes u) 

1) Node u draws dc{u) from {!,..., fc(u)} according to 

n. 

2) Upon reciving packet x, if c{x) < Cinlogn, node u 

• puts X into its forward queue and increments c{x). 
« with probability dc{u)/k, accepts x for storage and 
updates its storage variable y~ to y+ as 

yu=yu®xs, (14) 

If c{x) < Cinlogn, x is removed from circulation. 

3) When a node receives a packet before the current round, 
it forwards its head-of-line (HOL) packet to a randomly 
chosen neighbor 

4) Encoding phase ends and storage phase begins when 
each node has seen its k{u) source packets. 
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B. Performance Analysis 

Parameters {k,Vt) determine the error rate performance and 
encoding/decoding complexity of the corresponding Fountain 
code. With input (fc, 51), the LTCDS algorithm produces 
a distributed Fountain code with parameters (fc, fi'), where 
Q! ^ 17. We next compute Vt' when the input distribution Q, 
is the Robust Soliton (|4]i, and discuss the performance and 
complexity of the corresponding Fountain code. 

Recall that node u draws dc{u) according to VL, and accepts 
a passing source packet with probability dc{u)/k. Therefore, 
the number of packets that u accepts, given dc{u), is Bino- 
mially distributed with parameter dc{u)/k, and the number of 
packets that u accepts takes value i with probability il'(i): 

fc 

Piidciu) ^ t\d,{u))n{dc{u)) 

de(") = l 

= t (')f^)'fi-^V"n(4(")). 



de(") = l 



fc 



A simple way to achieve il' = fl would be to let each u 
store each distinct passing source packet until it collects all 
fc, and then randomly choose exactly dc{u) packets, where 
dc{u) is drawn according to fl. This approach would require 
large buffers, which is usually not practical, especially when 
fc is large. Therefore, we assume that nodes have limited 
memory and let them make their decision upon each reception. 
Our approach, as the following theorem shows, results in 
a Fountain code with comparable efficiency and the same 
complexity as the one determined by the Robust or Ideal 
Soliton distributions. 

Theorem 8: Suppose the LTCDS algorithm uses the Robust 
Soliton distribution (01) for il. Then, the fc source packets can 
be recovered from any K' — (3K nodes with probability 1 — 5 
for sufficiently large fc, where j3 > (1 — e"^)~^ and K ~ 
k + O {Vklo^ {k / 5)) (K would be sufficient for recovery 
when 17' = fl). The decoding complexity is 0{klog{k/6)). 

Proof: The probability that a node stores no information is 



d 



d=l ^ ^ d=l 

k k 



-,-d 



d=l 

7-1 



d=l 



E 

d=l 



kd 



d=l 



<-E 



k ^ d 

d=l 



< o 



(lnfe)_2^ , 

Vk 



e -/^■ + ^r!/(d)e 

k 

^ d(d-l) 
(15) 



R^e-^ i?ln(|)e-ft e"! A e'^ 



fc fc ^—^ did — 

d=2 ^ 



Therefore, for sufficiently large fc, i7'(0) < e ^. Consequently, 
if we randomly take K' = (3K nodes from the network, where 



/3 > (1 — e ^) ^, we have 

Ft{No < (1 - a)K'{l - n'{0)) } < 

K'n'{o){i-n'{o)) 



-Hi 



for any a > 0, where iVo denotes the number of nodes that 
store encoded packets. Therefore, we have K'(l — e"^) > K 
nodes that store encoded packets with a high probability for 
sufficiently large n and fc. 

We next show that the original fc source packets can be 
recovered based on i^T = fc + O (Vfc log^ (fc stored packets 
with probability 1 — (5, by an argument very similar to the one 
in [20]. When a source packet is decoded (e.g., from stored 
packets with degree one), we say that all the other encoded 
packets that contain this source packet are covered. In the 
decoding process, call the set of covered encoded packets that 
have not been fully decoded (all the contained source packets 
are decoded) as the ripple. The main idea of the proof is to 
show the ripple size variation is very similar to a random walk, 
and the probability that the ripple size deviates from its mean 
in fc steps by 8(\/fc) is small [20]. 

It can be shown that the expected number of stored packets 
of degree one is 9'R for some constant 6' > 0. Employing a 
Chernoff bound argument, we can show that with probability 
at least 1 — 6/3, the initial ripple size due to degree one packets 
is at least 9R/2 for a suitable constant 9 > 0. Then by the 
same argument used in the proof for Theorem 17 in [20], 
it can be shown that without contribution of T{k/R) in ft, 
the ripple does not disappear for L — fc — 1, . . . ,R and the 
decoding process is successful until R stored packets remain 
undecoded with probability at least 1 — 5/3. 

Further, like Proposition 15 in [20], we can show that using 
only the contribution of T{k/R) in Vt, the last R blocks can 
be decoded with probability 1 — 5/3 when between 2i? and 
R stored packets remain undecoded . This implies that the 
decoding process completes successfully with probability 1—5. 

Finally, the decoding complexity is the average degree D 
of a stored packet: 



k 

D = J:^ 

i=l 


' k 

E 

.d=l 


k 

=E^ 

d=l 


' k 

E 

.i=l 


k 

=E^ 

d=l 


"fc-1 

E 


k 

= Ydn{d) 

d=i 



k~i 



n{d) 



fc- 1 



i- 1 

k-l\ fd 



1 - -r 



1 - 



k-l-i 



n{d) 

n{d) 



where the last equality is due to Theorem 13 in [20]. 



(16) 



□ 



From the calculation of 17' (0), with the Robust or Ideal 
Soliton distribution, we also have 



17'(0) > 



1 

2^' 



(17) 
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Remark: One interesting implication of (fTST i and ( [TT] ) is that 
in order to achieve the same performance as that of original LT 
codes, more than (1 — e^"^ /2)^^K w 1.07A' nodes, but less 
than (1 — e^^)~^K w l.lhK nodes are required to recover 
the original k source packets. 

Another main performance metric is the transmission cost 
of the algorithm, which is characterized by the total number of 
transmissions (the total number of steps of k random walks). 

Theorem 9: The total number of transmissions of the 
LTCDS algorithm is Q{kn\ogn). 

Proof: In the interference phase of the LTCDS algorithm, the 
total number of transmissions is upper bounded C'n for some 
constant C" > 0. That is because each node needs to receive 
the first visit source packet for C2 times, and by Lemma |4] 
the mean inter-visit time is 9(ri). In the encoding phase, 
in order to guarantee that each source packet visits all the 
nodes, the number of steps of each of the k random walks is 
required to be 9(nlogn). Since there are k source packets, 
the total number of transmissions algorithm is 9(fcnlogri). □ 

Note that the algorithm proposed in [10] has similar order 
of total number transmissions. If geometric information is 
available, as in [9], the complexity can be reduced, e.g., 
Q{ky^logn) for the algorithm proposed in [9]. 

IV. Raptor Codes Based Algorithms 

Recall that Raptor codes are concatenated codes whose 
inner codes are LT and outer codes (pre-codes) are tradi- 
tional erasure correcting codes. For the pre-codes will use 
is randomized LDPC codes with k inputs and ni outputs 
(m > k). Assume n and k are known or have been estimated 
at every node. To perform the LDPC coding for k sources 
in a distributed manner, we again use simple random walks. 
Each source node first generates b copies of its own source 
packet, where b follows some distribution Pldpc defining the 
LDPC precode. (See [21] for the design of randomized LDPC 
codes for Raptor codes.) These b copies are then sent into 
the network by random walks. Each of the remaining n — k 
nodes in the network chooses to serve as a parity node with 
probability [m — k)/{n — k). We refer to the parity nodes 
together with the original (systematic) source nodes as the 
pre-coding output nodes. All pre-coding output nodes accept 
a source packet copy with the same probability; consequently, 
the b copies of a given source packet get distributed uniformly 
among all pre-coding output nodes. In this way, we have m 
pre-coding output nodes, each of which contains an XOR of a 
random number of source packets. The detailed description of 
the pre-coding algorithm is given below. After obtaining the m 
pre-coding outputs, to obtain Raptor codes based distributed 
storage, we apply the LTCDS algorithm with these m nodes 
as new sources and an appropriate f2 as discussed in [21]. 
Definition 10: (Pre-coding Algorithm) 
1) Each source node s,s = l,...,k draws a random 
number b{s) according to the distribution of predefined 
LDPC codes, generates b{s) copies of its source packet 
Xs with its ID and a counter c{xs) with an initial value 
of zero in the packet header, and sends each of them to 
one of its randomly chosen neighbors. 



2) Each of the remaining n — k nodes chooses to serve 
as a parity node with probability {m — k) / [n — k). 
These parity nodes and the original source nodes are 
pre-coding output nodes. Each pre-coding output node 
w generate a random number a(w) according to the 
following distribution: 

where =Eb^aDPc(fo). 

3) Each node that has packets in its forward queue before 
the current round sends its HOL packet to one of its 
randomly chosen neighbors. 

4) When a node u receives a packet x with c{x) < 
C3nlog(n), u puts the packet into its forward queue 
and increments the counter 

5) Each pre-coding output node w accepts the first a{w) 
copies of different a{w) source packet with counters 
c{x) > C3nlog(n), and updates w's pre-coding result 
each time as 

y^^Vw^x. (18) 

If a copy of X is accepted, it will not be forwarded 
any more, and w will not accept any other copy of Xs^ ■ 
When the node w completes a{w) updates, becomes 
its pre-coding packet. 

V. Performance Evaluation 

We evaluate the performance of LTCDS and RCDS algo- 
rithms by simulation. Our main performance metric is the 
successful decoding probability vs. the query ratio. 

Definition 11: The query ratio rj is the ratio between the 
number of queried nodes h and the number of sources k: 

ri^h/k. (19) 

Definition 12: (successful decoding) We say that decoding 
is successful if it results in recovery of all k source packets. 

For a query ratio 77, we evaluate Pg by simulation as follows: 
Let h — rjk denote the number of queried nodes. We select 
(uniformly at random) 10% of the (^) possible subsets of 
size h of the n network nodes, and try to decode the k 
source packets from each subset. Then the fraction of times 
the decoding is successful measures our P,. 

Fig.|2]shows the decoding performance of LTCDS algorithm 
with known n and k. For VI, we chose the Ideal Soliton 
distribution (|2]i. The network is deployed in ^ = [0, 5]^ with 
density A = and the system parameter Ci ~ 3. From the 
simulation results, we can see that when the query ratio is 
above 2, the successful decoding probability Pg is about 99%. 
When n increases but k/n and 77 remain constant, Pg increases 
when 77 > 1-5 and decreases when 77 < 1.5. This is because 
when there are more nodes, it is more likely that each node 
has the Ideal Soliton distribution. 

In Fig. [3] we fix 77 to 1.4 and 1.7 and k/n = 0.1. From the 
results, it can be seen that as n increases, Pg increases until it 
reaches a plateau, which is the successful decoding probability 
of LT codes. 
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(known n&k): n=200, k=20 
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-t>- (unknown n&k): n^200, k^20 
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2.5k 



Fig. 2. Performance of LTCDS with known n and k for (a) n=200, fc=20; 
(b) n=500, fc=50; and (c) n=1000, fc=100. 



Fig. 4. Perfonnance of LTCDS algoritlim with small number of nodes and 
sources for (a) known n=100 and k = 10; (b) known n=200 and k = 20; 
(c) unknown n=100 and k = 10; (d) unknown n=200 and k = 20. 
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Fig. 3. Performance of LTCDS with different known n and k and fixed 
number of queried nodes for two cases: (a) r] = 1.4; (b) r) = 1.7. 



We compare the decoding performance of LTCDS with 
known and unknown values of n and k in Fig. |4] and Fig. |5] 
The network is deployed in A — [0,5]'^, and the system 
parameter is set as Ci — 10. To guarantee each node to obtain 
accurate estimates of n and k, we set C2 large enough as 
C2 = 50. The decoding performance of the LTCDS algorithm 
with unknown n and A: is a little bit worse than that of the 
LTCDS algorithm with known n and k when 77 is small, and 
almost the same when rj is large. Such difference between the 
two algorithms becomes marginal when the number of nodes 
and sources increase as shown in Fig. |5] 

An interesting observation in Fig.|2] Fig.|4]and Fig.|5]is that 
the probability of successful decoding is almost zero until we 
query about Life nodes. This is due to the nodes that store no 
information in the network. As we pointed out in the Remark 
after the proof of Theorem |8] for Robust Soliton distribution, 
more than 1.07fc but less than l.lSfc nodes are needed to query 
to achieve the same performance of LT codes. Similar results 
also hold for Ideal Soliton distribution. 

To investigate how the system parameter Ci affects the 
decoding performance of the LTCDS algorithm with known 
n and fc, we fix 77 and vary Ci. The simulation results are 
shown in Fig. |6] When Ci > 3, Pg keeps almost like a 
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Fig. 5. Performance of LTCDS algorithm with large number of nodes and 
sources for (a) known n=500 and k = 50; (b) known n=1000 and k = 100; 
(c) unknown n=500 and k = 50; (d) unknown n=1000 and k = 100. 



constant, which indicates that after Snlogn steps, almost all 
source packets visit each node at least once. 

Furthermore, to investigate how the system parameter C2 
affects the decoding performance of the LTCDS algorithm, 
we fix -q and Ci, and vary C2. From Fig. |2l we can see that 
when C2 is small, the performance of the LTCDS algorithm is 
very poor. This is due to the inaccurate estimates of k and n 
by each node. When C2 is large, for example, when C2 > 30, 
the performance is almost the same. 

Fig. [8] and Fig. |9] show the histograms of the estimation 
results for n and fc, based on equations ( fT3] l. As expected, 
the estimates of k are more accurate and concentrated than the 
estimates of n. 
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